Picture for Dadi Guo

Dadi Guo

PrivacyPeek: Auditing What LLM-Based Agents Acquire, Not Just What They Say

Add code
May 29, 2026
Viaarxiv icon

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

Add code
Apr 08, 2026
Viaarxiv icon

Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?

Add code
Mar 04, 2026
Viaarxiv icon

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

Add code
Jan 26, 2026
Viaarxiv icon

The Why Behind the Action: Unveiling Internal Drivers via Agentic Attribution

Add code
Jan 21, 2026
Viaarxiv icon

NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems

Add code
Jan 16, 2026
Viaarxiv icon

MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems?

Add code
Jun 06, 2025
Viaarxiv icon

AIDBench: A benchmark for evaluating the authorship identification capability of large language models

Add code
Nov 20, 2024
Figure 1 for AIDBench: A benchmark for evaluating the authorship identification capability of large language models
Figure 2 for AIDBench: A benchmark for evaluating the authorship identification capability of large language models
Figure 3 for AIDBench: A benchmark for evaluating the authorship identification capability of large language models
Figure 4 for AIDBench: A benchmark for evaluating the authorship identification capability of large language models
Viaarxiv icon

Federated Domain-Specific Knowledge Transfer on Large Language Models Using Synthetic Data

Add code
May 23, 2024
Viaarxiv icon

P-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models

Add code
Nov 07, 2023
Viaarxiv icon